Sparse local feature extraction is usually believed to be of important significance in typical vision tasks such as simultaneous localization and mapping, image matching and 3D reconstruction. At present, it still has some deficiencies needing further improvement, mainly including the discrimination power of extracted local descriptors, the localization accuracy of detected keypoints, and the efficiency of local feature learning. This paper focuses on promoting the currently popular sparse local feature learning with camera pose supervision. Therefore, it pertinently proposes a Shared Coupling-bridge scheme with four light-weight yet effective improvements for weakly-supervised local feature (SCFeat) learning. It mainly contains: i) the \emph{Feature-Fusion-ResUNet Backbone} (F2R-Backbone) for local descriptors learning, ii) a shared coupling-bridge normalization to improve the decoupling training of description network and detection network, iii) an improved detection network with peakiness measurement to detect keypoints and iv) the fundamental matrix error as a reward factor to further optimize feature detection training. Extensive experiments prove that our SCFeat improvement is effective. It could often obtain a state-of-the-art performance on classic image matching and visual localization. In terms of 3D reconstruction, it could still achieve competitive results. For sharing and communication, our source codes are available at https://github.com/sunjiayuanro/SCFeat.git.
translated by 谷歌翻译
Magnetic resonance (MR) and computer tomography (CT) images are two typical types of medical images that provide mutually-complementary information for accurate clinical diagnosis and treatment. However, obtaining both images may be limited due to some considerations such as cost, radiation dose and modality missing. Recently, medical image synthesis has aroused gaining research interest to cope with this limitation. In this paper, we propose a bidirectional learning model, denoted as dual contrast cycleGAN (DC-cycleGAN), to synthesize medical images from unpaired data. Specifically, a dual contrast loss is introduced into the discriminators to indirectly build constraints between real source and synthetic images by taking advantage of samples from the source domain as negative samples and enforce the synthetic images to fall far away from the source domain. In addition, cross-entropy and structural similarity index (SSIM) are integrated into the DC-cycleGAN in order to consider both the luminance and structure of samples when synthesizing images. The experimental results indicate that DC-cycleGAN is able to produce promising results as compared with other cycleGAN-based medical image synthesis methods such as cycleGAN, RegGAN, DualGAN, and NiceGAN. The code will be available at https://github.com/JiayuanWang-JW/DC-cycleGAN.
translated by 谷歌翻译
Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.
translated by 谷歌翻译
文档级信息提取(IE)任务最近开始使用端到端的神经网络技术对其句子级别的IE同行进行认真重新审视。但是,对方法的评估在许多维度上受到限制。特别是,Precision/Recell/F1分数通常报道,几乎没有关于模型造成的错误范围的见解。我们基于Kummerfeld和Klein(2013)的工作,为基于转换的框架提出了用于文档级事件和(N- ARY)关系提取的自动化错误分析的框架。我们采用我们的框架来比较来自三个域的数据集上的两种最先进的文档级模板填充方法;然后,为了衡量IE自30年前成立以来的进展,与MUC-4(1992)评估的四个系统相比。
translated by 谷歌翻译
我们研究了一种模块化方法,可以解决对象重排的长马移动操作任务,该任务将完整的任务分解为一系列子任务。为了解决整个任务,先前的工作将具有点目标导航技能的多个固定操作技巧,这些技巧是在子任务上单独学习的。尽管比整体端到端的RL政策更有效,但该框架遭受了技能链条的复杂错误,例如导航到一个不良位置,在这种情况下,固定操作技能无法达到其目标进行操纵。为此,我们建议操纵技巧应包括移动性,以便从多个位置与目标对象进行互动,同时导航技能可能具有多个终点,从而导致成功的操纵。我们通过实施移动操纵技能而不是固定技能来实现这些想法,并训练接受区域目标而不是积分目标的导航技能。我们在家庭助理基准(HAB)中评估了三个挑战性的长途移动操纵任务M3,并在3个挑战性的长途移动操纵任务上评估了我们的多技能,并且与基线相比表现出了出色的性能。
translated by 谷歌翻译
语义细分是计算机视觉中的一个流行研究主题,并且在其上做出了许多努力,结果令人印象深刻。在本文中,我们打算搜索可以实时运行此问题的最佳网络结构。为了实现这一目标,我们共同搜索深度,通道,扩张速率和特征空间分辨率,从而导致搜索空间约为2.78*10^324可能的选择。为了处理如此大的搜索空间,我们利用差异架构搜索方法。但是,需要离散地使用使用现有差异方法搜索的体系结构参数,这会导致差异方法找到的架构参数与其离散版本作为体系结构搜索的最终解决方案之间的离散差距。因此,我们从解决方案空间正则化的创新角度来缓解离散差距的问题。具体而言,首先提出了新型的解决方案空间正则化(SSR)损失,以有效鼓励超级网络收敛到其离散。然后,提出了一种新的分层和渐进式解决方案空间缩小方法,以进一步实现较高的搜索效率。此外,我们从理论上表明,SSR损失的优化等同于L_0-NORM正则化,这说明了改善的搜索评估差距。综合实验表明,提出的搜索方案可以有效地找到最佳的网络结构,该结构具有较小的模型大小(1 m)的分割非常快的速度(175 fps),同时保持可比较的精度。
translated by 谷歌翻译
我们研究了将人类设计师创建的基于图像的,逐步组装手册转换为机器可解剖说明的问题。我们将此问题提出为顺序预测任务:在每个步骤中,我们的模型都读取手册,将要添加到当前形状中的组件定位,并注入其3D姿势。此任务构成了在手动图像和实际3D对象之间建立2D-3D对应关系的挑战,以及对看不见的3D对象的3D姿势估计,因为要在步骤中添加的新组件可以是从前一个步骤中构建的对象。为了应对这两个挑战,我们提出了一个基于学习的新型框架,即手动到执行计划网络(MEPNET),该网络(MEPNET)从一系列手动图像中重建了组装步骤。关键思想是将神经2D关键点检测模块和2D-3D投影算法进行高精度预测和强有力的概括为看不见的组件。 MEPNET在三个新收集的乐高手册数据集和Minecraft House数据集上优于现有方法。
translated by 谷歌翻译
很少有细粒度的学习旨在将查询图像分类为具有细粒度差异的一组支持类别之一。尽管学习不同对象通过深神网络的局部差异取得了成功,但如何在基于变压器的架构中利用查询支持的跨图像对象语义关系在几个摄像机的细粒度场景中仍未得到充分探索。在这项工作中,我们提出了一个基于变压器的双螺旋模型,即HelixFormer,以双向和对称方式实现跨图像对象语义挖掘。 HelixFormer由两个步骤组成:1)跨不同分支的关系挖掘过程(RMP),以及2)在每个分支中表示增强过程(REP)。通过设计的RMP,每个分支都可以使用来自另一个分支的信息提取细粒对象级跨图义语义关系图(CSRMS),从而确保在语义相关的本地对象区域中更好地跨图像相互作用。此外,借助CSRMS,开发的REP可以增强每个分支中发现的与语义相关的局部区域的提取特征,从而增强模型区分细粒物体的细微特征差异的能力。在五个公共细粒基准上进行的广泛实验表明,螺旋形式可以有效地增强识别细颗粒物体的跨图像对象语义关系匹配,从而在1次以下的大多数先进方法中实现更好的性能,并且5击场景。我们的代码可在以下网址找到:https://github.com/jiakangyuan/helixformer
translated by 谷歌翻译
我们介绍了程序化运动概念,这是人类行为的层次运动表示形式,可捕获低级运动和高级描述作为运动概念。这种表示可以使人类运动描述,交互式编辑以及单个框架中新型视频序列的受控合成。我们介绍了一个体系结构,该体系结构以半监督的方式从配对的视频和动作序列中学习此概念表示。我们代表的紧凑性还使我们能够提出一个低资源的培训配方,以进行数据效率学习。通过超越建立的基线,尤其是在小型数据制度中,我们证明了我们框架对多个应用程序的效率和有效性。
translated by 谷歌翻译
最近,自我监督的表示学习(SSRL)在计算机视觉,语音,自然语言处理(NLP)以及最近的其他类型的模式(包括传感器的时间序列)中引起了很多关注。自我监督学习的普及是由传统模型通常需要大量通知数据进行培训的事实所驱动的。获取带注释的数据可能是一个困难且昂贵的过程。已经引入了自我监督的方法,以通过使用从原始数据自由获得的监督信号对模型进行判别预训练来提高训练数据的效率。与现有的对SSRL的评论不同,该评论旨在以单一模式为重点介绍CV或NLP领域的方法,我们旨在为时间数据提供对多模式自我监督学习方法的首次全面审查。为此,我们1)提供现有SSRL方法的全面分类,2)通过定义SSRL框架的关键组件来引入通用管道,3)根据其目标功能,网络架构和潜在应用程序,潜在的应用程序,潜在的应用程序,比较现有模型, 4)查看每个类别和各种方式中的现有多模式技术。最后,我们提出了现有的弱点和未来的机会。我们认为,我们的工作对使用多模式和/或时间数据的域中SSRL的要求有了一个观点
translated by 谷歌翻译